Oracle and Human Baselines for Native Language Identification

نویسندگان

  • Shervin Malmasi
  • Joel R. Tetreault
  • Mark Dras
چکیده

We examine different ensemble methods, including an oracle, to estimate the upper-limit of classification accuracy for Native Language Identification (NLI). The oracle outperforms state-of-the-art systems by over 10% and results indicate that for many misclassified texts the correct class label receives a significant portion of the ensemble votes, often being the runner-up. We also present a pilot study of human performance for NLI, the first such experiment. While some participants achieve modest results on our simplified setup with 5 L1s, they did not outperform our NLI system, and this performance gap is likely to widen on the standard NLI setup.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fusion of Simple Models for Native Language Identification

In this paper we describe the approaches we explored for the 2017 Native Language Identification shared task. We focused on simple word and sub-word units avoiding heavy use of hand-crafted features. Following recent trends, we explored linear and neural networks models to attempt to compensate for the lack of rich feature use. Initial efforts yielded f1-scores of 82.39% and 83.77% in the devel...

متن کامل

Multilingual native language identification

We present the first study of Native Language Identification (NLI) applied to text written in languages other than English, using data from six languages. NLI is the task of predicting an author’s first language (L1) using only their writings in a second language (L2), with applications in Second Language Acquisition and forensic linguistics. Most research to date has focused on English but the...

متن کامل

The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language

The INTERSPEECH 2016 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: classification of deceptive vs. non-deceptive speech, the estimation of the degree of sincerity, and the identification of the native language out of eleven L1 classes of English L2 speakers. In this paper, we describe these su...

متن کامل

Beliefs about Non-Native Teachers in English as an International Language: A Positioning Analysis of Iranian Language Teachers’ Voices

The unprecedented growth of English and arrival of English as an International Language (EIL) has generated a new fledged argument about English language teachers’ role and status around the world. To date, much of the debate on the native/non-native distinction in EIL settings and factors contributing to sharpen distinctions has remained unsettled. This gap motivated this study on the English ...

متن کامل

Identification and Introducing the Multi-Layered Identity of Human Characters on Stone Motifs of Chahleshtar Qajar Castle

The Chaleshtar castle in Chahar Mahal and Bakhtiari is decorated with varied and unique stone herbal, human and animal ornaments. In the decoration of this Qajar building, the traces of the three national, religious, and foreign patterns of the Qajar period are observed, especially in human designs. Resemblance of human ornaments in this building with aforementioned patterns influenced by nativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015